Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve masking #122

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Improve masking #122

wants to merge 9 commits into from

Conversation

rhysgt
Copy link
Contributor

@rhysgt rhysgt commented Apr 15, 2024

Masking is now performed on accessing data in the datastore.

Removed the preview function in set_mask since it is now extraneous as the original data is not being overwritten.

We should consider moving cropping and masking from hrdic into base

rhysgt added 4 commits April 15, 2024 14:44
Masking is now performed on accessing data in the datastore
pytest_cases is not compatible with pytest 8 :/
Tests now pass
@rhysgt rhysgt requested a review from mikesmic as a code owner April 15, 2024 14:00
@mikesmic
Copy link
Collaborator

I've made some changes, can you check it works as expected? You can set the mask with:
dic_map.data.generate('mask', mask=bool_array).
How are these masks generated? Is it quite standard or do you change things about to fit the data?
We need to change the mask function so the stored data is not mutated. This can maybe be done by casting as a masked array and then getting the filled array.

rhysgt added 3 commits April 17, 2024 15:53
Replace mutation of stored data with numpy masked array
Better way to set null mask?
@rhysgt
Copy link
Contributor Author

rhysgt commented Apr 17, 2024

Have fixed the problems you stated I think - using a masked array instead of mutation and now generates a null mask in a better way (?)

Moat of the time, the masks I used are quite straightforward, for example (from docs):

To remove data points in dic_map where max_shear is above 0.8, use:

mask = dic_map.data.max_shear > 0.8

To remove data points in dic_map where e11 is above 1 or less than -1, use:

mask = (dic_map.data.e[0, 0] > 1) | (dic_map.data.e[0, 0] < -1)

To remove data points in dic_map where corrVal is less than 0.4, use:

mask = dic_map.corr_val < 0.4

@rhysgt
Copy link
Contributor Author

rhysgt commented Apr 18, 2024

Also - there is an inconsistency in function naming - calc_mask and set_crop?

@mikesmic
Copy link
Collaborator

I did call it set_mask but it would be confusing because it doesn't set anything, it just creates a mask image that the generate function uses. set_crop actually sets crop boundary values. I don't know about passing out masked arrays, will they work with everything else in the library? Although I looked at masking yesterday and I couldn't find a way to create an array with nans set for masked values without making a copy of the data. I need to look through the logic for the making again, my goal was to only run the masking function if a mask is set.

@rhysgt
Copy link
Contributor Author

rhysgt commented Apr 18, 2024

As far as I'm aware, everything still seems to works as expected with a masked array.

It does incur an overhead (but much smaller by a factor 1000 than the previous method).

Do we need to change the logic to not use masked arrays for data that isn't masked? Currently a masked array is always generated.

@rhysgt
Copy link
Contributor Author

rhysgt commented Apr 24, 2024

A masked array is only returned when a mask is provided. If unset, the normal map data is passed through as before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants